12 research outputs found

    Contributions to probabilistic non-negative matrix factorization - Maximum marginal likelihood estimation and Markovian temporal models

    Get PDF
    Non-negative matrix factorization (NMF) has become a popular dimensionality reductiontechnique, and has found applications in many different fields, such as audio signal processing,hyperspectral imaging, or recommender systems. In its simplest form, NMF aims at finding anapproximation of a non-negative data matrix (i.e., with non-negative entries) as the product of twonon-negative matrices, called the factors. One of these two matrices can be interpreted as adictionary of characteristic patterns of the data, and the other one as activation coefficients ofthese patterns. This low-rank approximation is traditionally retrieved by optimizing a measure of fitbetween the data matrix and its approximation. As it turns out, for many choices of measures of fit,the problem can be shown to be equivalent to the joint maximum likelihood estimation of thefactors under a certain statistical model describing the data. This leads us to an alternativeparadigm for NMF, where the learning task revolves around probabilistic models whoseobservation density is parametrized by the product of non-negative factors. This general framework, coined probabilistic NMF, encompasses many well-known latent variable models ofthe literature, such as models for count data. In this thesis, we consider specific probabilistic NMFmodels in which a prior distribution is assumed on the activation coefficients, but the dictionary remains a deterministic variable. The objective is then to maximize the marginal likelihood in thesesemi-Bayesian NMF models, i.e., the integrated joint likelihood over the activation coefficients.This amounts to learning the dictionary only; the activation coefficients may be inferred in asecond step if necessary. We proceed to study in greater depth the properties of this estimation process. In particular, two scenarios are considered. In the first one, we assume the independence of the activation coefficients sample-wise. Previous experimental work showed that dictionarieslearned with this approach exhibited a tendency to automatically regularize the number of components, a favorable property which was left unexplained. In the second one, we lift thisstandard assumption, and consider instead Markov structures to add statistical correlation to themodel, in order to better analyze temporal data

    Contributions à la factorisation en matrices non-négatives probabiliste — Estimation par maximum de vraisemblance marginale et modèles markoviens temporels

    No full text
    La factorisation en matrices non-négatives (NMF, de l’anglais non-negative matrix factorization) est aujourd’hui l’une des techniques de réduction de la dimensionnalité les plus répandues, dont les domaines d’application recouvrent le traitement du signal audio, l’imagerie hyperspectrale, ou encore les systèmes de recommandation. Sous sa forme la plus simple, la NMF a pour but de trouver une approximation d’une matrice des données non-négative (c’est-à-dire à coefficients positifs ou nuls) par le produit de deux matrices non-négatives, appelées les facteurs. L’une de ces matrices peut être interprétée comme un dictionnaire de motifs caractéristiques des données, et l’autre comme les coefficients d’activation de ces motifs. La recherche de cette approximation de rang faible s’effectue généralement en optimisant une mesure de similarité entre la matrice des données et son approximation. Il s’avère que pour de nombreux choix de mesures de similarité, ce problème est équivalent à l’estimation jointe des facteurs au sens du maximum de vraisemblance sous un certain modèle probabiliste décrivant les données. Cela nous amène à considérer un paradigme alternatif pour la NMF, dans lequel les taches d’apprentissage se portent sur des modèles probabilistes dont la densité d’observation est paramétrisée par le produit des facteurs non-négatifs. Ce cadre général, que nous appelons NMF probabiliste, inclut de nombreux modèles à variables latentes bien connus de la littérature, tels que certains modèles pour des données de compte. Dans cette thèse, nous nous intéressons à des modèles de NMF probabilistes particuliers pour lesquels on suppose une distribution a priori pour les coefficients d’activation, mais pas pour le dictionnaire, qui reste un paramètre déterministe. L'objectif est alors de maximiser la vraisemblance marginale de ces modèles semi-bayésiens, c’est-à-dire la vraisemblance jointe intégrée par rapport aux coefficients d’activation. Cela revient à n’apprendre que le dictionnaire, les coefficients d’activation pouvant être inférés dans un second temps si nécessaire. Nous entreprenons d’approfondir l’étude de ce processus d’estimation. En particulier, deux scénarios sont envisagées. Dans le premier, nous supposons l’indépendance des coefficients d’activation par échantillon. Des résultats expérimentaux antérieurs ont montré que les dictionnaires appris via cette approche avaient tendance à régulariser de manière automatique le nombre de composantes ; une propriété avantageuse qui n’avait pas été expliquée alors. Dans le second, nous levons cette hypothèse habituelle, et considérons des structures de Markov, introduisant ainsi de la corrélation au sein du modèle, en vue d’analyser des séries temporellesNon-negative matrix factorization (NMF) has become a popular dimensionality reductiontechnique, and has found applications in many different fields, such as audio signal processing,hyperspectral imaging, or recommender systems. In its simplest form, NMF aims at finding anapproximation of a non-negative data matrix (i.e., with non-negative entries) as the product of twonon-negative matrices, called the factors. One of these two matrices can be interpreted as adictionary of characteristic patterns of the data, and the other one as activation coefficients ofthese patterns. This low-rank approximation is traditionally retrieved by optimizing a measure of fitbetween the data matrix and its approximation. As it turns out, for many choices of measures of fit,the problem can be shown to be equivalent to the joint maximum likelihood estimation of thefactors under a certain statistical model describing the data. This leads us to an alternativeparadigm for NMF, where the learning task revolves around probabilistic models whoseobservation density is parametrized by the product of non-negative factors. This general framework, coined probabilistic NMF, encompasses many well-known latent variable models ofthe literature, such as models for count data. In this thesis, we consider specific probabilistic NMFmodels in which a prior distribution is assumed on the activation coefficients, but the dictionary remains a deterministic variable. The objective is then to maximize the marginal likelihood in thesesemi-Bayesian NMF models, i.e., the integrated joint likelihood over the activation coefficients.This amounts to learning the dictionary only; the activation coefficients may be inferred in asecond step if necessary. We proceed to study in greater depth the properties of this estimation process. In particular, two scenarios are considered. In the first one, we assume the independence of the activation coefficients sample-wise. Previous experimental work showed that dictionarieslearned with this approach exhibited a tendency to automatically regularize the number of components, a favorable property which was left unexplained. In the second one, we lift thisstandard assumption, and consider instead Markov structures to add statistical correlation to themodel, in order to better analyze temporal data

    Closed-form Marginal Likelihood in Gamma-Poisson Matrix Factorization

    Get PDF
    International audienceWe present novel understandings of the Gamma-Poisson (GaP) model, a probabilistic matrix fac-torization model for count data. We show that GaP can be rewritten free of the score/activation matrix. This gives us new insights about the estimation of the topic/dictionary matrix by maximum marginal likelihood estimation. In particular , this explains the robustness of this estima-tor to over-specified values of the factorization rank, especially its ability to automatically prune irrelevant dictionary columns, as empirically observed in previous work. The marginalization of the activation matrix leads in turn to a new Monte Carlo Expectation-Maximization algorithm with favorable properties

    Bayesian mean-parameterized nonnegative binary matrix factorization

    No full text
    International audienceBinary data matrices can represent many types of data such as social networks, votes, or gene expression. In some cases, the analysis of binary matrices can be tackled with nonneg-ative matrix factorization (NMF), where the observed data matrix is approximated by the product of two smaller nonnegative matrices. In this context, probabilistic NMF assumes a generative model where the data is usually Bernoulli-distributed. Often, a link function is used to map the factorization to the [0, 1] range, ensuring a valid Bernoulli mean parameter. However, link functions have the potential disadvantage to lead to uninterpretable models. Mean-parameterized NMF, on the contrary, overcomes this problem. We propose a unified framework for Bayesian mean-parameterized nonnegative binary matrix factorization models (NBMF). We analyze three models which correspond to three possible constraints that respect the mean-parameterization without the need for link functions. Furthermore, we derive a novel collapsed Gibbs sampler and a collapsed variational algorithm to infer the posterior distribution of the factors. Next, we extend the proposed models to a nonpara-metric setting where the number of used latent dimensions is automatically driven by the observed data. We analyze the performance of our NBMF methods in multiple datasets for different tasks such as dictionary learning and prediction of missing data. Experiments show that our methods provide similar or superior results than the state of the art, while automatically detecting the number of relevant components

    Approximate Bayesian Computation with Domain Expert in the Loop

    No full text
    Approximate Bayesian computation (ABC) is a popular likelihood-free inference method for models with intractable likelihood functions. As ABC methods usually rely on comparing summary statistics of observed and simulated data, the choice of the statistics is crucial. This choice involves a trade-off between loss of information and dimensionality reduction, and is often determined based on domain knowledge. However, handcrafting and selecting suitable statistics is a laborious task involving multiple trial-and-error steps. In this work, we introduce an active learning method for ABC statistics selection which reduces the domain expert’s work considerably. By involving the experts, we are able to handle misspecified models, unlike the existing dimension reduction methods. Moreover, empirical results show better posterior estimates than with existing methods, when the simulation budget is limited.Peer reviewe

    An Empirical Study of Steganography and Steganalysis of Color Images in the JPEG Domain

    No full text
    International audienceThis paper tackles the problem of JPEG steganography and steganalysis for color images, a problem that has rarely been studied so far and which deserves more attention. After focusing on the 4:4:4 sampling strategy, we propose to modify for each channel the embedding rate of J-UNIWARD and UERD steganographic schemes in order to arbitrary spread the payload between the luminance and the chrominance components while keeping a constant message size for the different strategies. We also compare our spreading payload strategy w.r.t. two strategies: (i) the concatenation of the cost map (CONC) or (ii) equal embedding rates (EER) among channels. We then select good candidates within the feature sets designed either for JPEG or color steganography. Our conclusions are threefold: (i) the GFR or DCTR features sets, concatenated on the three channels offer better performance than ColorSRMQ1 for JPEG Quality Factor (QF) of 75 and 95 but ColorSRMQ1 is more sensitive for QF=100, (ii) the CONC or EER strategies are suboptimal, and (iii) depending of the quality factors and the embedding schemes, the empirical security is maximized when between 33% (QF=100, UERD) and 95% (QF=75, J-UNIWARD) of the payload is allocated to the luminance channel

    A Comparative Study of Gamma Markov Chains for Temporal Non-Negative Factorization

    Get PDF
    International audienceNon-negative matrix factorization (NMF) has become a well-established class of methods for the analysis of non-negative data. In particular, a lot of effort has been devoted to probabilistic NMF, namely estimation or inference tasks in probabilistic models describing the data, based for example on Pois- son or exponential likelihoods. When dealing with time series data, several works have proposed to model the evolution of the activation coefficients as a non-negative Markov chain, most of the time in relation with the Gamma distribution, giving rise to so-called temporal NMF models. In this paper, we review four Gamma Markov chains of the NMF literature, and show that they all share the same drawback: the absence of a well-defined station- ary distribution. We then introduce a fifth process, an overlooked model of the time series literature named BGAR(1), which overcomes this limitation. These temporal NMF models are then compared in a MAP framework on a prediction task, in the context of the Poisson likelihood

    An Empirical Study of Steganography and Steganalysis of Color Images in the JPEG Domain

    No full text
    International audienceThis paper tackles the problem of JPEG steganography and steganalysis for color images, a problem that has rarely been studied so far and which deserves more attention. After focusing on the 4:4:4 sampling strategy, we propose to modify for each channel the embedding rate of J-UNIWARD and UERD steganographic schemes in order to arbitrary spread the payload between the luminance and the chrominance components while keeping a constant message size for the different strategies. We also compare our spreading payload strategy w.r.t. two strategies: (i) the concatenation of the cost map (CONC) or (ii) equal embedding rates (EER) among channels. We then select good candidates within the feature sets designed either for JPEG or color steganography. Our conclusions are threefold: (i) the GFR or DCTR features sets, concatenated on the three channels offer better performance than ColorSRMQ1 for JPEG Quality Factor (QF) of 75 and 95 but ColorSRMQ1 is more sensitive for QF=100, (ii) the CONC or EER strategies are suboptimal, and (iii) depending of the quality factors and the embedding schemes, the empirical security is maximized when between 33% (QF=100, UERD) and 95% (QF=75, J-UNIWARD) of the payload is allocated to the luminance channel

    Multi-Fidelity Bayesian Optimization with Unreliable Information Sources

    No full text
    This work was supported by the Academy of Finland (Flagship programme: Finnish Center for Artificial Intelligence FCAI and decision 341763), EU Horizon 2020 (European Network of AI Excellence Centres ELISE, 951847; HumanE AI Net, 952026), UKRI Turing AI World-Leading Researcher Fellowship (EP/W002973/1). We also acknowledge the computational resources provided by the Aalto Science-IT Project from Computer Science IT. | openaire: EC/H2020/951847/EU//ELISE | openaire: EC/H2020/952026/EU//HumanE-AI-NetBayesian optimization (BO) is a powerful framework for optimizing black-box, expensive-to-evaluate functions. Over the past decade, many algorithms have been proposed to integrate cheaper, lower-fidelity approximations of the objective function into the optimization process, with the goal of converging towards the global optimum at a reduced cost. This task is generally referred to as multi-fidelity Bayesian optimization (MFBO). However, MFBO algorithms can lead to higher optimization costs than their vanilla BO counterparts, especially when the low-fidelity sources are poor approximations of the objective function, therefore defeating their purpose. To address this issue, we propose rMFBO (robust MFBO), a methodology to make any GP-based MFBO scheme robust to the addition of unreliable information sources. rMFBO comes with a theoretical guarantee that its performance can be bound to its vanilla BO analog, with high controllable probability. We demonstrate the effectiveness of the proposed methodology on a number of numerical benchmarks, outperforming earlier MFBO methods on unreliable sources. We expect rMFBO to be particularly useful to reliably include human experts with varying knowledge within BO processes.Peer reviewe
    corecore